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Abstract 
> 

Q^ ' A lossy data compression scheme for uniformly biased Boolean messages is investigated via 

o' 

^^ ' statistical mechanics techniques. We utilize tree-like committee machine (committee tree) and 

t~^ . tree-like parity machine (parity tree) whose transfer functions are non-monotonic. The scheme 

o, 

oo , performance at the infinite code length limit is analyzed using the replica method. Both committee 



and parity treelike networks are shown to saturate the Shannon bound. The AT stability of the 



S^ ' Replica Symmetric solution is analyzed, and the tuning of the non-monotonic transfer function is 

. 5t , also discussed. 
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I. INTRODUCTION 



The tools of statistical mechanics have been successfully applied in several problems 
of information theory in recent years. In particular, in the field of error correcting codes 
pj, y, y, |j], spreading codes |5|, IgJ, and compression codes [7|, la, |9|, [lO|, lUl], statistical me- 
chanical techniques have shown great potential. The present paper uses similar techniques 
to investigate a lossy compression scheme. Lossless compression, which was first pointed 
out by the pioneering paper of Shannon 12], has been widely studied for many years. After 



much effort, a set o: 
been proposed [1 



ve ry g ood codes have been designed and practical implementations have 



14l . Il5l |. Lossy compression, on the other hand, was also first studied by 






another paper of Shannon [16|] . A lot of practical lossy compression schemes were developed 
over the years (for example JPEG compression, MPEG compression etc.) but at the present 
time, none of these schemes saturate the Shannon bound given by the rate- distortion theo- 
rem. Nevertheless, several theoretical schemes which reach this optimal bound have already 
been proposed. Recently, Shannon optimal codes based on sparse systems have been discov- 
ered [tI, [17|, [18|, [19(] and it is now the general tendency to use such kinds of systems. These 



codes saturate the Shannon bound asymptotically, (i.e.: for an infinite codeword length), 
and in the dense generating matrix limit (but low connectivity sparse matrix already gives 
near Shannon performance). However, there is still a lot of work to be done for densely 
connected systems. One of such systems is given by using perceptron-based decoder. There 
have been some recent studies on the encoding problem of such schemes using the belief 

n 

propagation (BP) algorithm and the results seems promising [20|. The foundations of this 
encoding method for such lossy compression schemes was originally put forward by Mu- 
rayama using the TAP equations applied to Sourlas-type codes |8|. It is important to study 
a wide class of decoder to extract a pool of schemes which can give near Shannon bound 
performance prior to fully investigate the encoding problem. The study of such schemes 
could gives interesting clues on how the lossy compression process works, and it might also 
help to pinpoint some essential features a scheme should possess in order to achieve Shannon 
optimal performance. 

This paper extends the framework introduced in 



10 



Uj and studies three different 



decoders based on a non-monotonic multilayer perceptron. Hosaka et al. studied the simple 
perceptron network featuring a non-monotonic transfer function in order to have a mirror 



symmetry property in their model (i.e.: f{u) = /(—«)). This was motivated by the behef 
that the Edwards-Anderson order parameter should be zero to reach the Shannon bound. 
Consequently, if one codeword s is optimal (note that here optimal denotes a codeword which 
gives the minimum achievable distortion for the concerned scheme), —s is also optimal. 
Then, they show that for an infinite length codeword, their scheme effectively saturates the 



Shannon bound. Next, one interesting feature of the model proposed by Mimura et al. [11| 
is to increase the number of optimal codewords by using a multilayer decoder network. The 
number of optimal codeword is function of the number of hidden units K in the decoder 
network (for example, in their parity-tree model with an even number of hidden units, there 
are at least 2^^^ optimal codewords). Thus, one can expect that finding an optimal codeword 
becomes more and more easy as the number of hidden units increases. Nevertheless, their 
model deal only with unbiased messages. The main advantage of the model proposed in this 
paper is to combine the benefits of Hosaka et al. model (mirror symmetry and ability to 
deal with biased messages) with the benefits of Mimura et al. model (increasing number of 
optimal codewords with the number of hidden units). By studying three different schemes 
we would like to extract some essential characteristics a good lossy compression framework 
should possess. Finally, the Almeida-Thouless (AT) stability of the obtained solutions is 
also studied and presents very good properties with almost no unstable part. 

The paper is organized as follows. Section II introduces the framework of lossy compres- 
sion. Section III exposes our model. Section IV presents the mathematical tools used to 
evaluate the performance of the present scheme. Section V states some results concerning 
the vahdity of the obtained solutions and section VI is devoted to conclusion and discussion. 

II. LOSSY COMPRESSION 



Let us begin by introducing the framework of lossy data compression [2l|]. Let y he a 
discrete random variable defined on a source alphabet 3^. An original source message is 
composed of M random variables, y = {y^, . . . ,y*^) G y^^ , and compressed into a shorter 
expression. The encoder compresses the original message y into a codeword s, using the 
transformation s = J-'{y) G S^ , where N < M. The decoder maps this codeword s onto 
the decoded message y, using the transformation y = G{s) G y^^ . The encoding/decoding 
scheme can be represented as in Figure [U In this case, the code rate is defined hj R = N/M. 
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FIG. 1: Rate distortion encoder and decoder. 

A distortion function d is defined as a mapping d : y x y —>■ M+. For each possible pair of 
{y,y), it associates a positive real number. In most of the cases, the reproduction alphabet 
y is the same as the alphabet 3^ on which the original message y is defined. 

Hereafter, we set 3^ = 3^, and we use the Hamming distortion as the distortion function 
of the scheme. This distortion function is given by 



d{y,y) 



y = y, 
y 7^y, 



(1) 



^A/ 



SO that the quantity d{y, y) = X]»=i ^iv^^ V^) measures how far the decoded message y 
is from the original message y. In other words, it records the error made on the original 
message during the encoding/decoding process. The probability of error distortion can 
be written E[d{y^y)\ = P[y ^ y] where E represents the expectation. Therefore, the 
distortion associated with the code is defined as D = E[j^d{y,y)], where the expectation 
is taken with respect to the probability distribution P[y, y]. D corresponds to the average 
error per variable y^. Now we defined a rate distortion pair (i?, D) and we said that this 
pair is achievable if there exist a coding/decoding scheme such that when M ^ oo and 
N ^ oo (note that the rate R is kept finite), we have E[j^d{y,y)] < D. In other words, 
a rate distortion pair (i?, D) is said to be achievable if there exist a pair (JF, Q) such that 
E[^d{y, y)] <D in the limit M ^ oo and iV ^ oo. 

The optimal compression performance that can be obtained in the framework of lossy 
compression is given by the so-called rate distortion function R{D) which gives the best 



achievable code rate i? as a function of D 



2l|. However, despite the fact that the best 



achievable performance is known, no clues are given about how to construct such an opti- 
mal compression scheme. Moreover, finding explicitly the expression of the rate distortion 
function is, in general cases, not possible. 

Nonetheless, for the special case of uniformly biased Boolean messages in which each 



component is generated independently by the same probability distribution P[y = 0] = 
1 — P[y = 1] = p, it is possible to calculate analytically the rate distortion function R{D), 
which becomes 

R{D) = H,{p)-H2{D), (2) 

where H2{x) = — xlog2a; — {1 — x) log2(l — x). In the sequel, we restrict ourselves to this 
particular case (i.e. : P[y = 0] = 1 — P[y = 1] = p and y = {0, 1}). 

III. COMPRESSION USING NON-MONOTONIC MULTILAYER PERCEP- 
TRONS 

In this section we introduce our compression scheme. To make the calculations compatible 
with the statistical mechanics framework, let us map the Boolean representation {0, 1} to 
the Ising representation {—1, 1} by means of the mapping a = (—1)'', where a is the Ising 
variable and p is the Boolean one. On top of that, we set y = S = y = {—1, 1}. Since we 
consider that all the y'^ are generated independently by an identical biased binary source, 
we can easily write the corresponding probability distribution, 

Pb^] = p6{l - y^ + (1 - p)6{l + y^. (3) 

Next we define the decoder of the compression scheme. We use a non-linear transformation 
Q : S'^ -^ y^^ which associates a codeword s G S'^ with a sequence y G y^ . For a given 
original message y, the encoder is simply defined as follows, 

J='{y) = argmin d{y,g{s)). (4) 

s 

For the non-linear transformation Q, we utilize non-monotonic multilayer perceptrons. 
The codeword s is split down into A^/i^-dimensional K disjoint vectors Si, . . . , sk G S^^^ 
so that s can be written s = (si, . . . , Sk)- In this paper, we will focus on three different 
architectures for the non-monotonic multilayer perceptrons. There are the foUowings : 

(I) Multilayer parity tree with non-monotonic hidden units. Its output is written 

r{s)^\[hUj^s,.x'i\. (5) 

(II) Multilayer committee tree with non-monotonic hidden units. Its output is written 

K 



y^{s) = sgn I ^ fk 






(6) 



Note that in this case, if the number of hidden units K is even, then there is a possibihty to 
get for the argument of the sign function. We avoid this uncertainty by considering only 
an odd number of hidden units for the committee tree with non-monotonic hidden units in 
the sequel. 

(Ill) Multilayer committee tree with a non-monotonic output unit. Its output is written 

(7) 



\ V n 

1=1 








In each of these structure, fk is a non-monotonic function of a real parameter k of the form 




fk{x) = { (8) 



and the vectors xf are fixed A^/if-dimensional independent vectors uniformly distributed 
on { — 1, 1}. The sgn function denotes the sign function taking 1 for x > and —1 for a; < 0. 
Each of this architecture applies a different transformation to the codeword s. The general 
architecture of these perceptrons based decoders is shown in Figure O Note that we can 
also consider a decoder based on a committee-tree where both the hidden-units and the 
output unit are non-monotonic. However, this introduces an extra-parameter (we will have 
one threshold parameter for the hidden-units, and one for the output unit) to tune and the 
performance should not change drastically. For simplicity, we restrict our study to the above 
three cases only. 

Now let us introduce 7i, an energy function of the system, 

n{y,yis)) = d{y,yis)). (9) 

This energy function 7i is clearly minimized for a codeword s which satisfies equation (jll). 
Furthermore, in the Ising representation, the Hamming distance d takes a simple form 

d{x,y) = l-e{xy), (10) 

where B denotes the unit step function which takes 1 for x > and for x < 0. 

The encoding phase can be viewed as a classical perceptron learning problem, where one 
tries to find the weight vector s which minimizes the energy function 7i for the original 
message y and the random input vector x. The vector s which achieve this minimum gives 




ylll 



FIG. 2: General architecture of the treehke multilayer perceptron with A'^ input units and K hidden 
units. 

us the codeword to be send to the decoder. Therefore, in the case of a lossless compression 
scheme (i.e.: D = 0), evaluating the rate distortion property of the present scheme is 
equivalent to finding the number of couplings s which satisfies the input/output relation 



x^ h-i> y'^. In ot 



network 



22 



ler words, this is equivalent to the calculation of the storage capacity of the 



23|. 



The choice of parity-tree based and committee-tree based network is motivated by the 
thorough literature available on this kind of networks. Parity and committee machines have 
been intensively studied (see 2J] for an overview) by the machine-learning community over 
the years. The techniques used to calculate the storage capacity of such networks gives us a 
starting point for our analytical evaluation of the typical performance of the above schemes. 



IV. ANALYTICAL EVALUATION 

We analyze the performance of these three different compression schemes using the tools 
of statistical mechanics. We first define the following partition function, 

Z(/3, y,x) = J2 exp [-Pn{y, yis))] , (11) 

s 

where the sum over s represents the sum over all the possible states for the vector s. (3 

denotes the inverse temperature parameter. Such a partition function can be identified 

with the partition function of a spin glass system with dynamical variables s and quenched 

variables x. For a fixed Hamming distortion MD = E[d{y, y)], the average of this partition 

function over y and x naturally contains all the interesting typical properties of the scheme 

such as the entropy. However, evaluating this average is hard and we need some technique 

to investigate it. In this paper we use the so-called Replica Method in order to calculate the 

average of the partition function. In the case of such a discrete system, the entropy should 



not be negative so that the zero entropy criterion (see [23(]) gives us the best achievable code 
rate limit. The replica method's calculations to obtain the average of the partition function 
{Z{P, y, x))y ^ are detailed in Appendix |Al 

A. Replica symmetric solution for the parity tree w^ith non-monotonic hidden 
units 

In the lossy compression scheme using parity tree with non-monotonic hidden units ([5]), 
the replica symmetric free energy is given by 

f{(3,R,k) = -l(^pln[e-^ + (l-e-^)A,] 

+ (l-p)ln[e-^ + (l-e-^)(l-A)] 

+i?ln2), (12) 

where 



1 1 



^fc = o + o [1 - 4i^(^)] 



K 



2 2 



5 



+00 g-i^/2 



H{k) = I —=dt. (13) 



The internal free energy is 

e-^(l - A,) 



u{(3,k) = p 



e-/3 + (1 - e-f^)Ak 



Minimizing the free energy with respect to Ak, taking the zero temperature hmit /? — > cxo 
and using the identity ( 1A6I) gives 

A, = i^-^ (15) 

D = — -. (16) 

Finally, using the zero entropy criterion, one can get 

R = H2{p)-H2{D), (17) 

which is identical to the rate-distortion function ([2]). 

However, this minimum is reached under the conditions given by equations flTSl) and flT6l) . 
Since D is fixed, the condition given by the equation (TT6|) is easily satisfied by choosing the 
proper inverse temperature parameter (3 = ln[(l — D)/D]. On the other hand, the condition 
given by equation fllSp is satisfied by properly tuning the parameter k of the non-monotonic 
function fk- Let us denote the optimal k which satisfies equation (flS!) by k. In the case of 
the parity tree, this optimal k is such that the following equation becomes true 




In this paper we consider that (p, D) G {[0, |]}^, therefore, one can easily show that for p 7^ | 
then YZ^ is negative which implies that there is no real solution for the above equation if 
we have an even number of hidden units K (because of the i^-th root). However, we can 
also consider the case where {p,D) G {[|, 1] x [0,|]} without any change (the probability 
oi y = 1 and y = —1 are just inverted) and in this case jz^ is positive which implies that 
there is always a solution for an any value of K. The above problem is just a consequence 
of the definition of p, but is not related to the model. So in the case of the parity tree, k 
always exists independently of the value of K. 

Finally, since we used the replica symmetric (RS) ansatz, we have to verify the Almeida- 
Thouless (AT) stability of the solution to confirm its validity. This is done in the next 
section. 



B. Replica symmetric solution for the committee tree with non-monotonic hidden 
units 

In the lossy compression scheme using committee tree with non-monotonic hidden units 
(P), the rephca symmetric free energy is given by 

+ (l-p)ln[e-^+(l-e-^)(l-5fe) 
+/21n2^ 



where 



Bu 



Tl=±l I 


' K 

.1=1 . 


K 

11 

1=1 



i + n{i-AH[k]) 



(19) 



(20) 



The sum other r/ represent the sum over each possible state for the dummy variable ti 
{ti = ±1). The internal free energy is 



u{p,k) = p-^ 



Bk 



+ (1-P) 



1 - e-P)Bk 

e-f'Bk 



(21) 



e-/3 + (i_e-/^)(l-i?fe)- 
As in the parity tree case, after minimizing the free energy with respect to B^., taking the 
zero temperature limit f3 ^ oo and using the identity (]A6|) . we obtain 

p-D 



Bk 
D 



I -2D 



l + e-l^' 
Finally, using the zero entropy criterion, one can get 

R = H2{p)-H2{D), 



(22) 
(23) 

(24) 



which is identical to the rate-distortion function ([2]). However, here it is not easy to discuss 
the existence of an optimal k which satisfies the condition given by equation ( !22l) . Such an 
optimal k satisfies the following equation 



ri=±l V 


' K 
.1=1 . 


K 

11 

1=1 



l + Tiil-AH 



p-D 
I -2D' 



(25) 



We will discuss a little bit more on this existence problem in the next section, when we check 
the AT stability of the RS solution. 



10 



C. Replica symmetric solution for the committee tree w^ith a non-monotonic out- 
put unit 

In the lossy compression scheme using committee tree with a non-monotonic output unit 
i^^, the rephca symmetric free energy is given by 

+ (l-p)ln[e-^ + (l-e-^)(l-C,) 
+R\n2' 



(26) 



where 



Ck 



2-^f2(^\Q 



1=0 



k'-j^{2l-Kf 



(27) 



The term (") denotes the binomial coefficient. The internal free energy is 



u{(3,k) = p 



e-P + (1 _ e-^)Ck 

e-^Ck 



+ (1-P)- 



(28) 



e-/3 + (i-e-/3)(l-C,)- 

As in the parity tree case, after minimizing the free energy with respect to Ck, taking the 
zero temperature limit /? — *> oo and using the identity (]A6|) . we obtain 

p- D 



D 



I -2D 



Finally, using the zero entropy criterion, one can get 

R = H2{p)-H2{D), 



(29) 
(30) 



(31) 



which is identical to the rate-distortion function ([2]). However, here also, it is not easy to 
discuss the existence of an optimal k. Such an optimal k satisfies the following equation 

K 

I 



2-^Y.^^^^ 



1=0 



k--^{2l-Kf 



p-D 



1~2D 



(32) 



This existence problem is discussed later, when checking the AT stability of the RS solution. 
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V. ALMEIDA-THOULESS STABILITY OF THE REPLICA SYMMETRIC SOLU- 
TION 



In this section we check the A' 
We use the same method as in 
study are given in Appendix [B 



11 



stabihty (see 



25!1) of the RS solution of each scheme. 



22l |. The main mathematical points of the AT stability 



A. AT stability for the parity tree with non-monotonic hidden units 



In the case of a parity tree with non-monotonic hidden units, we find 



■K 



X 



[1 - mik)] 



K-l 



{eP + 1) + {e^ - l)y[l - 4H{k)] 



K 



Q = R = P' = Q' = R' = 0. 
Therefore, using equation flB9l) . the RS stability criterion is given by 



(33) 



R > -KPe 

TT 



-'Ue^ 



(e/3 + 1) + (e/3 - l)y[l - 4i7(fc)]^ 



(34) 



where P is given by (TT6|) . and where k satisfies equation (TTSl) . < . . . >y denotes the expec- 
tation with respect to ([3]). 

For p = \, that is to say for an unbiased message t/, k satisfies the equation H(k) = | 
which implies [1 — 4H{k)] = and so the AT line is given by the line R = 0. Consequently, 
for unbiased messages, the RS solution is always AT stable. 

Figure [3] shows the rate-distortion function plotted with the AT stability line for biased 
messages with p = 0.2. All the region below the AT line is unstable. Since no part of the 
rate distortion function is under the AT line, the RS solution is always stable. We did the 
same experiment for higher values of K and never found any unstable part. 

The lossy compression scheme using a parity tree with non-monotonic hidden units 
presents good properties. It saturates the Shannon bound for any value oi K > 2 and 
the RS solution seems to be always AT stable. 
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rate distortion function 




FIG. 3: AT line and rate distortion function for the parity tree with 3 hidden units. The rate 
distortion performance of the parity tree is given by the rate distortion function. The original 
message is biased with bias p = 0.2. The rate distortion function is always above the AT line and 
thus, the RS solution is always stable. 

B. AT stability for the committee tree with non-monotonic hidden units 



In the case of a comittee tree with non-monotonic hidden units, we find 



P = R-^Ip 

+ (1-P) 



1 + (e« - l)Bk 
{eP-l){Bl-B,) 



n 2' 



Q = R = P' = Q' = R' = 



(35) 



where 



Bl 



ri=±l 
K 

1=2 



■ K 



l + ri(l 



4fce-'= /2 



27r 



m[k]) 



1=1 

1 + nil -AH 



(36) 
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Therefore, using equation (lB9p . the RS stabihty criterion is given by 



R > K {p 



e^-l)iB-,-Bl) 
1 + (e/3 _ i)Bi. 



+a-p) 



l + (e/3-i)(i-5^) 



-\ 2 



(37) 



where /3 is given by (1231) . and where k satisfies equation (!25ll . 



AT line 

rate distortion function 




FIG. 4: AT line and rate distortion function for the committee tree with 3 non-monotonic hidden 
units. The rate distortion performance of the committee tree is given by the rate distortion function. 
The original message is unbiased (p = 0.5). The rate distortion function is always above the AT 
line and thus, the RS solution is always stable. 

However as mentioned in the previous section, it is not clear if there exists k which makes 
equation fl25l) true. Nevertheless, we did some numerical calculations for K = 3 and K = 5 
(in this case we consider only odd values of K as mentioned earlier), and always found an 
optimal k (= k) in those cases. 

We presents here the results obtain for K = 3. Figures H] and [5] show the rate-distortion 
function plotted with the AT stability line for unbiased {p = 0.5) and biased {p = 0.2) 
messages. Since no part of the rate distortion function is under the AT line, the RS solution 
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FIG. 5: AT line and rate distortion function for the committee tree with 3 non-monotonic hidden 
units. The rate distortion performance of the committee tree is given by the rate distortion function. 
The original message is biased {p = 0.2). The rate distortion function is always above the AT line 
and thus, the RS solution is always stable. 

is always stable. We tried also for higher values of K and no unstable part were found for 
the RS solution. 

The lossy compression scheme using a committee tree with non-monotonic hidden units 
also presents good properties. If an optimal k exists (which seems to be always true), it 
saturates the Shannon bound and the RS solution seems to be always AT stable. 



C. AT stability for the committee tree with a non-monotonic output unit 

In the case of a committee tree with a non-monotonic output unit, we find 



P' 



TT^ 



:i _ e/3)22-2(i^-2) 



X 



K-2 



( "^^^ "I - ( "^7^ ] 



^M 



-11 



P = Q = R = Q' = R' = 0^ 



(38) 
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where [x] denotes the ceihng function ([x] = niinjn G 1j\n < x}) and \_x\ denotes the 
floor function {[x\ = m.a,x{n G Z|?t, < x}). Therefore, using equation flB9p . the RS stabihty 
criterion is given by 

R > :^(^_zil(i_e^)22-2(^-2) 

7 ^- ^_C ^z^. V 



7r2 



X 



2 I 1-2 



(39) 



where /? is given by (!30l) . and where A; satisfies equation ( l32l) . < . . . >y denotes the same 
expectation as in the parity tree case. 

However as mentioned in the previous section, here also it is not clear if there exists k 
such that equation (132!) is satisfied. On top of that, the function Ck which depends on k 
is not continuous but discrete. C^ is a step function of k. Therefore, we might have no k 
satisfying equation ( !32l) . On the other hand, since Ck is a step function of k, if we find a k 
which satisfy equation ( l32l) . then it implies that k is not given by a unique solution but by an 
optimal interval where all the elements in this interval satisfy (1321) . We did some numerical 
experiments for unbiased message {p = 0.5). For the special case oi K = 2, equation (!32|) 
is clearly satisfied for any k g]0, v^[ so that in this case k is given by any element of the 
interval ]0, -\/2[- But for K > 2 (we checked until K = 100), we did not found any optimal 
k. We did the same thing for biased message {p = 0.2) with a fixed distortion D = 0.1 and 
for any K < 100 no optimal k exists. This implies that in the general case, the committee 
tree with a non-monotonic output unit does not saturate the Shannon bound. However, if 
the number of hidden units K becomes very large, we can apply the central limit theorem 
to replace the scalar product s/ ■ xi hy a. Gaussian variable. Under these conditions, the 
expression of Ck becomes very simple, 

Ck = l- 2H{k). (40) 

In this case, Ck is no more a step function, but a continuous function of k and it is easy to 
see that there is always a k which satisfy the equation 

1 - 2H(k) = l:iJL, (41) 

^ ' 1-2D ^ ' 

Let us denote it by fcinf. So in the large K limit, k = fcjnf exists and is unique. The 
compression scheme with a committee tree using a non-monotonic output unit saturates the 
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Shanon bound in this limit. It is however hard to check the AT stabihty for an infinite 
number of hidden units K (the binomial coefficient follows a factorial growth) but we claim 
the solution obtained by the RS ansatz to be always AT stable (except for some very narrow 
region where D ^ p). We show in Figures [H] and [7] the rate distortion function plotted with 
the AT line for K = 50 hidden units for unbiased {p = 0.5) and biased {p = 0.2) messages. 



AT line (calculated for K=50) 
rate distortion function 




FIG. 6: AT line and rate distortion function for the committee tree with a non-monotonic output 
unit. The rate distortion performance of the committee tree is given by the rate distortion function. 
The original message is unbiased {p = 0.5). The AT line is calculated using K = 50. The rate 
distortion function is always above the AT line and thus, the RS solution is always stable. 

To sum up this subsection, the lossy compression scheme using a committee tree with a 
non-monotonic output unit present a quite complex structure which does not saturate the 
Shannon bound in most cases. However, it does saturate it when the number of hidden units 
becomes infinite. Concerning the AT stability, the committee tree with a non-monotonic 
output unit does not seem to exhibit any critical instabihty for the RS solution. 
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AT line (calculated for K=50) 
rate distortion function 




FIG. 7: AT line and rate distortion function for the committee tree with a non-monotonic output 
unit. The rate distortion performance of the committee tree is given by the rate distortion function. 
The original message is biased {p = 0.2). The AT line is calculated using K = 50. The rate 
distortion function is always above the AT line except for a very narrow region where D ^ p. 

VI. CONCLUSION AND DISCUSSION 



We investigated a lossy compression scheme for uniformly biased Boolean messages using 
non-monotonic parity tree and non-monotonic committee tree multilayer perceptions. All 
the schemes were shown to saturate the Shannon bound under some specific conditions. The 
replica symmetric solution is always stable which tends to confirm the validity of the replica 
symmetric ansatz. 

The Edwards- Anderson order parameter q was always found to be 0, meaning that code- 
words are uncorrelated in the codeword space. As already mentioned in [9|, [UJ], one may 
conjecture that this is a necessary condition for a lossy compression scheme to achieve Shan- 
non limit. The mirror symmetry seems then to be an essential feature to saturate the 
Shannon bound. The committee tree with non-monotonic hidden units corresponds to the 
same committee tree model as in Mimura et al. paper ll|] with the exception of the hidden 
layer transfer function which is given by the non-monotonic transfer function fk in this pa- 
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per. By enforcing mirror symmetry in the hidden layer, we were able to get Shannon optimal 
performance for an infinite length codeword independently of the number of hidden units 
whereas this was not possible using Mimura et al. model, even for an infinite number of 
hidden units. In the same way, keeping the monotonic sgn function as the transfer function 
of the hidden layer and transforming only the output unit into a non-monotonic one by 
the use of fk (i.e.: the committee tree with a non monotonic output unit), we were able to 
reach Shannon limit with an infinite number of hidden units. Once again, enforcing mirror 
symmetry enabled to get Shannon optimal performance. 

Next, one can easily derive a lower bound for the number of optimal codewords (here op- 
timal means a codeword which gives the minimum achievable distortion of the corresponding 
scheme) for each of the three schemes. In the case of the parity tree and committee tree 
with non-monotonic hidden units, there are at least 2^ optimal codewords in the codeword 
space. Indeed, if s denotes an optimal codeword, we can replace any of its component S; by 
— si without altering the output of the hidden layer and thus leave the output of the network 
unchanged. In the case of the committee tree with a non-monotonic output unit, because 
of the more complex structure of the hidden layer, we can only guaranty the existence of 2 
optimal codewords which are given by s and —s. 

However, a formal encoder for those schemes would require a computational cost which 
grows exponentially with the original message length to perform its task. We need more 
efficient algorithms to reduce the encoding time. A preliminary study made by Hosaka et 
al. [20I] uses the BP algorithm for this task. This could be a good solution to achieve the 
encoding phase in a polynomial time. Another possibility is to use the survey propagation 



algorithm approach which was developed for satisfiability problems [26|]. Furthermore, as 
mentioned above, the parity tree and committee tree with non-monotonic hidden units have 
their number of optimal codeword which grows exponentially with the number of hidden 
units. This could made the search for one optimal codeword easier to achieve using some 
proper heuristics. This issue will be studied in a next paper. 
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APPENDIX A: ANALYTICAL EVALUATION USING THE REPLICA 
METHOD 

The free energy can be evaluated by the replica method (the parameter k is fixed here), 
where Z{P, y, x)"- denotes the n-times replicated partition function 

n 

Z{P,y,xr= Yl n^^P[-^^(2/,i/(0)]- (A2) 

S^,...,S"<i=l 

The vector s"" is given by s" = (s", . . . , s^-) and the superscript a denotes the replica index. 
By using the zero entropy criterion [23,], we have 

= (3[U - F] 

u = f, (A3) 

where U denotes the internal energy, and F the free energy, u and / denotes the same 
quantity per bit {u = U/N,f = F/N). In the zero entropy limit, only one state of the 
dynamical variable s achieves a distortion per bit inferior or equal to D. The free energy 
per bit 



1 
is equal to the internal energy per bit 



/(/3, fl) = -^^ In {Z(P, y, x))y^ (A4) 



nW - '-§■ (A5) 

This result /(/5, R) = u{(3) gives us an explicit relation between the code rate R and the 
inverse temperature (3. 

Since this temperature was artificially introduced by means of the parameter /3, we should 
get rid of it by taking the zero temperature limit (/3 -^ +oo) where the dynamical variable 
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freezes. At this limit, one can retrieve the codeword which minimizes the free energy and 
gives the best achievable code rate. However, since a distortion per bit D is tolerated, at the 
zero temperature limit the internal energy per bit should be equal to this distortion. This 
motivates the introduction of the following identity 



lim u{(3) = D. 

0^+00 



(A6) 



Finally, at this zero temperature limit, one can get an explicit relation which binds the best 
achievable code rate R with the distortion D: 



f{D,R)=D. 



(A7) 



We now proceed to the calculation of the replicated partition function flA2p . Inserting 
the identity 

n K 



nn/ <Hst-st-!^A 



n{n-l)K/2 



X exp 



EE' 



va<6 / 



ab I „a „b 



^l -^l 



N 
K 



Qi 



(A8) 

.a<b I ^ ' . 

into (IA2p enables to separate the relevant order parameter, and to calculate the average 
moment {Z{f3, y, x)"')y ^ for natural numbers n as. 



fdqf 



•^ \a<b I 

n{e-^ + (l-e-Oe(y,K})} 

a , 

+i 1" E -p f E E ft°'»r»?) - ^ E E <'ff 

|s?| \a<b I / a<b I 



a<b I 

X exp A^ 

X 



(A9) 



where Q; is a nxn matrix having elements {q^''} and where < . . . >y denotes the expectation 
with respect to ([3]). The function Q{y, {uf}) depends on the decoder and will be discussed in 
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the following subsections. We analyze the scheme in the thermodynamic limit N,M —* +cxd, 
while the code rate R is kept finite. In this limit, flA9l) can be evaluated using the saddle 
point method with respect to qf'' and qf"^. To continue the calculation, we have to make 
some assumptions about the structure of these order parameters. We use here the so-called 
replica symmetric (RS) ansatz 



Qi 



ab 



tab 



(1 - q)6ab + q, 
qr = il-q)5ab + q, (AlO) 

where Sab denotes the Kronecker delta. This ansatz means that all the hidden units are 
equivalent after averaging over the disorder. 



1. Replica symmetric evaluation for the parity tree w^ith non-monotonic hidden 
units 



In the case of a parity tree with non-monotonic hidden units, the function Q{y, {uf}) in 
9]) is given by 

e{y, {<}) = 9(yfl sgn [P - K)^] j . (All) 

We can then obtain the expression of the free energy as 

"+00 / K 



where 



f{(3,R,k,q,q) = --extr 
p 9,9 



X 



1=1 

ln[e-^ + (l-e-^)nfc({tJ,y 
2 cosh{^/^u) 



"+00 
-R I Du\n 

-00 



-R 



Dx 



q{l-q) 



'/'dx 



'l-K 



1 y 
- + - 

2 2 



K 



-IE 



k- ^ti 



(A12) 



(A13) 
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Taking the derivative of (1A12P with respect to q and q gives the saddle point equations for 
the order parameters 



/ J-+00 / K 
q = 2R-Uj \\{Dti 



. -{-^-e-')K{{ti],y) 



+00 



Dutanh^(y^M), 



(AM) 



where Ti'^{{ti},y) = dTlu{{ti},y)/dq. 

We solved this saddle point equation numerically and find that the solution is given for 
g = g = 0. According to 9|, IHj this result was expected and implies that all the codewords 
are uncorrelated and distributed all around S^ . Substituting g = g = into flA12p . one can 
finally find the free energy given by flT2|) . 



2. Replica symmetric evaluation for the committee tree with non-monotonic hid- 
den units 

In the case of a committee tree with non-monotonic hidden units, the function 0(y, {w/*}) 
in ( 1A9I) is given by 

e(y, {<}) = ^ [ 2/ E ^g^ [^' - «)'] ) ■ (A15) 

We can then obtain the expression of the free energy as 



f{(3,R,k,q,q) = --extr 
fj q,q 



+00 



+R / Duln 



-R 



2 cosh.{\/qu) 



g(l-g) 



(A16) 
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where 



^k{{ti},y) = E 1^ 

ri=±l 
K 

xn 

1=1 

-TlH 



■ K 



1=1 

i + n 



nH 



k + y/qti 



k- y^tl 



(A17) 



Taking the derivative of (JA16P with respect to q and q gives the saddle point equations for 
the order parameters 

»+oo / K 
q = 2R-'{ I mDti 



X- 



+ 00 



Du tanh^ {\/^u) , 



(A18) 



where SU{tz},i/) = 9Sfc({tJ,y)/9g. 

We solved this saddle point equation numerically and here also we find that the solution 
is given for q = q = 0. Substituting q = q = into (1A16I) . one can finally find the free 
energy given by (TT9|) . 



3. Replica symmetric evaluation for the committee tree with a non-monotonic 
output unit 

In the case of a committee tree with a non-monotonic output unit, the function Q{y, {uf}) 
in (1A9P is given by 

2~ 



e(y, K}) = e\y 



^(^E^gn[<]j 



(A19) 
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We can then obtain the expression of the free energy as 



f{P,R,k,q,q) = -lextr|M''"mZ}t 

x\n[e-^ + {l-e-^)Fj:4{ti},y 
2 cosh(y^M) 



r"+00 

+R I Du\n 



-R 



g(l - q) 



(A20) 



where 



Fj,,k{{ti},y) = J2 \^ 

K 

1=1 



yk 



2 y_ 

K 



E 



Tl 



-tin 



l-g. 



(A21) 



Taking the derivative of flA20l) with respect to q and q gives the saddle point equations for 
the order parameters 

-+00 / K 



q = 2R 



-1 



n^^' 



j=i 



:i-e-OF^,.({tz},l/) 
f (l-e-/^)Fs,fc({t,},y) 



+ 00 



Duta.Bh^{\/^u), 



(A22) 



where F^,{{ti},y) = dFj:^k{{ti},y)/dq. 

We solved this saddle point equation numerically and here also we find that the solution 
is given for q = q = 0. Substituting q = q = into flA20p . one can finally find the free 
energy given by fl26|) . 

APPENDIX B: ALMEIDA-THOULESS STABILITY CRITERION 

The Hessian computed at the RS saddle point characterizes fluctuations in the order 
parameters gf^ and gf^ around the RS saddle point. Instability of the RS solution is signaled 
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by a change of sign of at least one of the eigenvalues of the Hessian. Let A^({(?f*}, {Qf'^}) be 
the exponent of the integrand of integral (IA9p . Equation flA9p can be represented as 



{Z{(3,y,xr)y,a^ 



UUdqfdq^ 



\a<b I 



X 



exp{NM{{qf},{qf})). 



(Bl) 



We expand 7W around q and q in (5gf and Sqf" and then find up the second order 
M{{q + 6qf},{q + 6qf}) = M{{q}, {q}) + ^-'vGv 

+o(\\vr), 



(B2) 



where 



afei rr^abi 



v = \{6qf},{6qf },..., {Sq'K},{S<lK}) 



(B3) 



is the perturbation to the RS saddle point. The Hessian G is the following [Kn{n — 1)] x 
[Kn{n — 1)] matrix, 



G 



^U V ... V^ 

V U ... V 



V V 



u 



\ V V . . . U I 

where n{n — 1) x n{n — 1) matrices U and V are 



(B4) 



~ab,cd^ 



u 



V 



f~ab,cd^ ,^ab,cd^ \ ' 



r^o-b.cd^ 



|yab,crf| |y ' | 
f~ab,cd f^ab,cd | ' 



(B5) 
(B6) 
(B7) 
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with 



jjah.cd a2 A/< I i^„ah^„cd 



d^M/dqfdq, 



I ' 



ab.cd 



U ' =d^M/dqfdqf, 



^ab.cd _ r^2 KA mr,abr^;;cd 



d'^M/dqfdq 



I ; 



(B8) 



V 



ab,cd _ ffi \A /f)„aba„cd 



d^M/dqfdqf (/ ^ /') 



ab,cd 



V ' = d^M/dqfdqf {I ^ I') 



ab,cd 



V"'"'^ = d^M/dqfdqf {I ^ I'). 
For q, g to be a local maximum of A^, it is necessary for the Hessian G to be negative definite 
(i.e.: all of its eigenvalues must be negative). 



To check these eigenvalues, we use the same method as in ll|]. We do not give the 
mathematical details here. Finally, using Gardner's method 22], we can derive the stability 
criterion for the RS solution to be stable as 



K-f < 1, 



(B9) 
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where 



7 = 70 + (is: - 1)71, 

71 = P'- 2Q' + R', 



p -- TTab,ab 



Q = Jjab,ac (^Ij-^c), (BIO) 

p/ —- -trab,ab 



The hne K'-f = 1 defines the so called AT line. 



[1 
[2 

[3] 
[4] 
[5] 
[6] 

[7] 

[8] 

[9 

[10: 

[11 



N. Sourlas, Nature, 339, 693 (1989). 

Y. Kabashima, T. Murayama and D. Saad, Phys. Rev. Lett., 84, 1355 (2000). 

H. Nishimori and K. Y. M. Wong, Phys. Rev. E, 60, 132 (1999). 

A. Montanari and N. Sourlas, Eur. Phys. J. B, 18, 107 (2000). 

T. Tanaka, Europhys. Lett., 54, 540 (2001). 

T. Tanaka and M. Okada, IEEE Trans. Inform. Theory, 51, 2, 700 (2005). 

T. Murayama and M. Okada, J. Phys. A: Math. Gen.,36, 11123 (2003). 

T. Murayama, Phys. Rev. E, 69, 035105(R) (2004). 

T. Hosaka, Y. Kabashima and H. Nishimori, Phys. Rev. E, 66, 066126 (2002). 

T. Hosaka and Y. Kabashima, J. Phys. Soc. Jpn., 74, 1, 488 (2005). 

K. Mimura and M. Okada, Phys. Rev. E, 74, 026108 (2006). 

28 



[12] C. E. Shannon, Bell Syst. Tech. J., 27, 379 (1948). 

[13] R. Gallager, IRE Trans, on Info. Theory, 1968, IT-8, 21 (1968) 

[14] D.J.C. MacKay, R.M. Neal, IEEE Electronics Letters, 33(6), 457 (1997) 

[15] T.J. Richardson, R.L. Urbanke, IEEE Trans. Inform. Theory, 47(2), 638 (2001) 

[16] C. E. Shannon, IRE Nat. Conv. Rec, 4, 142 (1959). 

[17] Y. Matsunaga, H. Yamamoto, IEEE Trans. Inform. Theory, 49(9), 2225 (2003) 

[18] E. Martinian, M.J. Wainwright, Data Compression Conference, (2006) 

[19] M.J. Wainwright, IEEE Signal Processing Magazine, 24(5), 47 (2007) 

[20] T. Hosaka, Y. Kabashima, Physica A, 365, 113, (2006) 

[21] T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley, New York, 1991). 

[22] E. Gardner, J. Phys. A: Math. Gen., 21, 257 (1988). 

[23] W. Krauth and M. Mezard, J. Phys. (France), 50, 3057 (1989). 

[24] A. Engel and C. Van den Broeck, Statistical Mechanics of Learning (Cambridge University 

Press, 2001). 

[25] J. R. L. de Almeida and D. J. Thouless, J. Phys. A: Math. Gen., 11(5), 983 (1978). 

[26] M. Mezard, G. Parisi, R. Zecchina, Science, 297, 812 (2002) 



29 



